Building Corpora for the Philological Study of Swiss Legal Texts

نویسندگان

  • Stefan Höfler
  • Michael Piotrowski
چکیده

We describe the construction of two corpora in the domain of Swiss legal texts: The DS21 corpus is based on the Collection of Swiss Law Sources and contains historical legal texts from the early Middle Ages up to 1798; the Swiss Legislation Corpus (SLC) is based on the Classified Compilation of Swiss Federal Legislation and contains all current Swiss federal laws. The paper summarizes the key properties of both corpora, discusses issues encountered while building them, and outlines some applications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Color Dictionaries and Corpora

In the study of linguistics, a corpus is a data set of naturally occurring language (speech or writing) that can be used to generate or test linguistic hypotheses. The study of color naming worldwide has been carried out using three types of data sets: (1) corpora of empirical color-naming data collected from native speakers of many languages; (2) scholarly data sets where the color terms are o...

متن کامل

Comparative Study of the Academic Vocabulary Content of Electronic Engi-neering Corpora, GE Materials and M.S. Entrance Examinations

The importance of vocabulary learning has been underlined in the field of English for Academic Purposes (EAP) because non-English majors who require reading English texts in their fields of study have to expand their English vocabulary knowledge much more efficiently than ordinary ESL/EFL learners. Since academic vocabulary instruction in Iranian universities is realized through the use of Gene...

متن کامل

Legal Terms and Word Sketches: A Case Study

In this paper we describe an approach to the semiautomatic identification of legal terms in Czech texts. Our general goal is to offer supplementary tools for building dictionary of Czech law terms. At first we used the VaDis partial parser for recognition of the complex nominal constructions in a legal text – the current version of the Penal Code of the Czech Republic. Headwords of the recogniz...

متن کامل

Syntactic Complexity of Russian Unified State Exam Texts in English: A Study on Reliability and Validity

In this study we analyze texts used in Russian Unified State Exam on English language. Texts that formed small research corpora were retrieved from 2 resources: official USE database as a reference point, and popular website used by pupils for USE training “Neznaika” (https://neznaika.pro/). The size of two corpora is balanced: USE has 11934 tokens and “Neznaika” - 11918 tokens. We share Biber’...

متن کامل

From Historic Books to Annotated XML: Building a Large Multilingual Diachronic Corpus

This paper introduces our approach towards annotating a large heritage corpus, which spans over 100 years of alpine literature. The corpus consists of over 16.000 articles from the yearbooks of the Swiss Alpine Club, 60% of which represent German texts, 38% French, 1% Italian and the remaining 1% Swiss German and Romansh. The present work describes the inherent difficulties in processing a mult...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JLCL

دوره 26  شماره 

صفحات  -

تاریخ انتشار 2011